Goto

Collaborating Authors

 code chunk


Relative Positioning Based Code Chunking Method For Rich Context Retrieval In Repository Level Code Completion Task With Code Language Model

arXiv.org Artificial Intelligence

Abstract--Code completion can help developers improve efficiency and ease the development lifecycle. Although code completion is available in modern integrated development environments (IDEs), research lacks in determining what makes a good context for code completion based on the information available to the IDEs for the large language models (LLMs) to perform better . In this paper, we describe an effective context collection strategy to assist the LLMs in performing better at code completion tasks. The key idea of our strategy is to preprocess the repository into smaller code chunks and later use syntactic and semantic similarity-based code chunk retrieval with relative positioning. We found that code chunking and relative positioning of the chunks in the final context improve the performance of code completion tasks.


Impact-driven Context Filtering For Cross-file Code Completion

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) has recently demonstrated considerable potential for repository-level code completion, as it integrates cross-file knowledge with in-file preceding code to provide comprehensive contexts for generation. To better understand the contribution of the retrieved cross-file contexts, we introduce a likelihood-based metric to evaluate the impact of each retrieved code chunk on the completion. Our analysis reveals that, despite retrieving numerous chunks, only a small subset positively contributes to the completion, while some chunks even degrade performance. To address this issue, we leverage this metric to construct a repository-level dataset where each retrieved chunk is labeled as positive, neutral, or negative based on its relevance to the target completion. We then propose an adaptive retrieval context filtering framework, CODEFILTER, trained on this dataset to mitigate the harmful effects of negative retrieved contexts in code completion. Extensive evaluation on the RepoEval and CrossCodeLongEval benchmarks demonstrates that CODEFILTER consistently improves completion accuracy compared to approaches without filtering operations across various tasks. Additionally, CODEFILTER significantly reduces the length of the input prompt, enhancing computational efficiency while exhibiting strong generalizability across different models. These results underscore the potential of CODEFILTER to enhance the accuracy, efficiency, and attributability of repository-level code completion.


CoRet: Improved Retriever for Code Editing

arXiv.org Artificial Intelligence

In this paper, we introduce CoRet, a dense retrieval model designed for code-editing tasks that integrates code semantics, repository structure, and call graph dependencies. The model focuses on retrieving relevant portions of a code repository based on natural language queries such as requests to implement new features or fix bugs. These retrieved code chunks can then be presented to a user or to a second code-editing model or agent. To train CoRet, we propose a loss function explicitly designed for repository-level retrieval. On SWE-bench and Long Code Arena's bug localisation datasets, we show that our model substantially improves retrieval recall by at least 15 percentage points over existing models, and ablate the design choices to show their importance in achieving these results.


Working on a Computer Vision project? These code chunks will help… – Towards AI

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. "VR and AR will eventually converge, and smart glasses will take over our digital interactions."―


TensorFlow Tutorial For Beginners

#artificialintelligence

Note that you make use of global_variables_initializer() because the initialize_all_variables() function is deprecated. You have now successfully trained your model! That wasn't too hard, was it? You're not entirely there yet; You still need to evaluate your neural network. In this case, you can already try to get a glimpse of well your model performs by picking 10 random images and by comparing the predicted labels with the real labels. You can first print them out, but why not use matplotlib to plot the traffic signs themselves and make a visual comparison? However, only looking at random images don't give you many insights into how well your model actually performs.


Python Machine Learning: Scikit-Learn Tutorial

#artificialintelligence

Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical tasks are concept learning, function learning or "predictive modeling", clustering and finding predictive patterns. These tasks are learned through available data that were observed through experiences or instructions, for example. The hope that comes with this discipline is that including the experience into its tasks will eventually improve the learning. But this improvement needs to happen in such a way that the learning itself becomes automatic so that humans like ourselves don't need to interfere anymore is the ultimate goal. Today's scikit-learn tutorial will introduce you to the basics of Python machine learning: If you're more interested in an R tutorial, take a look at our Machine Learning with R for Beginners tutorial. The first step to about anything in data science is loading in your data. This is also the starting point of this scikit-learn tutorial. This discipline typically works with observed data. This data might be collected by yourself or you can browse through other sources to find data sets. But if you're not a researcher or otherwise involved in experiments, you'll probably do the latter.


TensorFlow Tutorial For Beginners

@machinelearnbot

Note that you make use of global_variables_initializer() because the initialize_all_variables() function is deprecated. You have now successfully trained your model! That wasn't too hard, was it? You're not entirely there yet; You still need to evaluate your neural network. In this case, you can already try to get a glimpse of well your model performs by picking 10 random images and by comparing the predicted labels with the real labels. You can first print them out, but why not use matplotlib to plot the traffic signs themselves and do a visual comparison? However, only looking at random images don't really give you many insights into how well your model actually performs.


TensorFlow Tutorial For Beginners – Hacker Noon

#artificialintelligence

Deep learning is a subfield of machine learning that is a set of algorithms that is inspired by the structure and function of the brain. TensorFlow is the second machine learning framework that Google created and used to design, build, and train deep learning models.You can use the TensorFlow library do to numerical computations, which in itself doesn't seem all too special, but these computations are done with data flow graphs. In these graphs, nodes represent mathematical operations, while the edges represent the data, which usually are multidimensional data arrays or tensors, that are communicated between these edges. The name "TensorFlow" is derived from the operations which neural networks perform on multidimensional data arrays or tensors! For now, this is all you need to know about tensors, but you'll go deeper into this in the next sections! Today's TensorFlow tutorial for beginners will introduce you to performing deep learning in an interactive way: Also, you could be interested in a course on Deep Learning in Python, DataCamp's Keras tutorial or the keras with R tutorial. To understand tensors well, it's good to have some working knowledge of linear algebra and vector calculus. You already read in the introduction that tensors are implemented in TensorFlow as multidimensional data arrays, but some more introduction is maybe needed in order to completely grasp tensors and their use in machine learning.


SciPy Tutorial: Linear Algebra

#artificialintelligence

An array is, structurally speaking, nothing but pointers. It contains information about the raw data, how to locate an element and how to interpret an element. The memory address and strides are important when you dive deeper into the lower-level details of arrays, while the data type and shape are things that beginners should surely know and understand. Two other attributes that you might want to consider are the data and size, which allow you to gather even more information on your array. Refresh the usage of the ndarray attributes in the following DataCamp Light chunk.


Python Machine Learning: Scikit-Learn Tutorial

#artificialintelligence

Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical tasks are concept learning, function learning or "predictive modeling", clustering and finding predictive patterns. These tasks are learned through available data that were observed through experiences or instructions, for example. The hope that comes with this discipline is that including the experience into its tasks will eventually improve the learning. But this improvement needs to happen in such a way that the learning itself becomes automatic so that humans like ourselves don't need to interfere anymore is the ultimate goal. There are close ties between this discipline and Knowledge Discovery, Data Mining, Artificial Intelligence (AI) and Statistics. Typical applications can be classified into scientific knowledge discovery and more commercial ones, ranging from the "Robot Scientist" to anti-spam filtering and recommender systems. But above all, you will know this discipline because it's one of the topics that you need to master if you want to excel in data science. Today's scikit-learn tutorial will introduce you to the basics of Python machine learning: step-by-step, it will show you how to use Python and its libraries to explore your data with the help of matplotlib, work with the well-known algorithms KMeans and Support Vector Machines (SVM) to construct models, to fit the data to these models, to predict values and to validate the models that you have build. The first step to about anything in data science is loading in your data. This is also the starting point of this scikit-learn tutorial.